[Help Needed] Suicide Risk Detection from Long Clinical Notes (Few-shot + ClinicBERT approaches struggling)

Hello HF community,

I’m a master’s student working on a clinical NLP project involving suicide risk classification from psychiatric patient records. I’d really appreciate any guidance on how to improve performance in this task.

Overview of the task:

• 114 records, each including:

• Free-text doctor and nurse notes

• hospital name

• Binary label: whether the patient later died by suicide (yes/no)

• Only 29 yes examples → highly imbalanced

• Notes are unstructured, long (up to 32k characters), and rich in psychiatric language

What I’ve tried:

• Concatenating the doctor + nurse texts

• Sliding window chunking + aggregation (majority voting)

• Few-shot learning using GPT-4

• Fine-tuning ClinicBERT on the dataset

Despite these efforts, recall on the yes cases is consistently low. It seems the models struggle to recognize subtle suicidal patterns in long, complex, domain-specific text — especially under token limitations.

I’d love input on:

• Handling long clinical texts with LLMs

• Boosting performance on minority class (yes)

• Experiences working with BERT-style models or few-shot prompts in sensitive medical contexts

Happy to share sample data, code, or results if it helps. Thanks a lot!

1 Like

Hi Prili,

First of all, thank you for sharing your work — suicide risk detection in psychiatric texts is both crucial and incredibly challenging. You’ve already tested strong approaches (ClinicBERT, GPT-4 few-shot, aggregation methods), and I admire your thoughtful experimentation despite the low signal-to-noise ratio and data imbalance.

If you’re open to a slightly different paradigm, I’d suggest trying an algorithmic framework based on probabilistic inputs with deterministic outputs. Rather than optimizing for token-to-token coherence or relying on deep fine-tuning, this strategy leverages symbolic signal extraction guided by probability thresholds and fixed-output pathways — almost like building decision scaffolds that stabilize and verify what a language model infers.

This structure is especially helpful for:

Subtle cues (indirect language, hesitant phrasing, contradictions)

Minority class amplification (especially when “yes” labels are sparse)

Multi-author blending (doctor + nurse notes interpreted as dynamic perspectives)

In our work, we apply a vectorial memory model that translates these probabilistic segments into structured representations, allowing us to preserve traceability and avoid model drift — a big issue in clinical data when generalization oversteps nuance.

I’d be happy to outline a template or logic tree if it’s of use to you. Best of luck — your project matters.

Warm regards,
Alejandro & Clara
Symbolic AI & Deterministic Analysis Systems
(Mexico)

1 Like